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The  Classification  and  Mixture  Maximum  Likelihood  ! 

Approaches  to  Cluster  Analysis* 

G.J.  McLachlan 

1.  INTRODUCTION 

A  common  and  very  old  problem  in  statistics  is  the  separation  of  a 
heterogeneous  population  into  more  homogeneous  subpopulations.  We  con¬ 
centrate  here  on  the  situation  where  the  population  of  interest,  II,  is 
known  or  assumed  to  consist  of,  say,  k  different  subpopulations  E^, ...  .E^, 
and  where  the  density  of  a  p-dimensional  observation  x  from  E^  is 
known  or  assumed  to  be  f^xjO)  for  some  unknown  vector  of  parameters, 

0  (i=l,...,k).  In  this  context  the  problem  may  be  formulated  as  follows: 

Given  a  random  sample  of  observations  x.,...,x  from  n,  attempt  to 
allocate  each  Xj  to  the  subpopulation  to  which  it  belongs.  We  let 

y'  =  (y, , . . . ,Y  )  denote  the  set  of  identifying  labels,  where  Y.  *  i 
-In  3 

if  x.  comes  from  E..  This  would  be  the  classical  discrimination 
~J  i 

problem  if  Y  were  known  a  priori;  a  discrimination  procedure  would  be 
*»» 

formed  from  the  classified  sample  for  the  allocation  of  subsequent  obser¬ 
vations  of  unknown  origin. 

In  what  is  sometimes  called  the  classification  maximum  likelihood 
procedure,  6  and  y  are  chosen  to  maximize 

n 

L_(x. , . . . ,x  ;0,y)  ■  E  f  (x  ;0)  .  (1.1) 

C  ~n  ~  ~  j-1  ~ 

The  maximization  is  over  the  set  of  values  of  y  corresponding  to 
all  possible  assignments  of  the  x^  to  the  various  subpopulations 
as  well  as  over  all  admissible  values  of  0.  The  estimates  of  0 
and  Y  80  obtained  are  denoted  by  0  and  y  respectively.  The 

*To  appear  in  Vol.  II  of  the  Handbook  of  Statistics  (edited  by  P.R. 

Krishnalah  and  L.  Kanal) . 
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x^,...,xn  are  then  classified  according  to  the  estimates  YjL»***»YnJ 

for  example,  x,  is  assigned  to  H  if  y,  ■  g,  This  procedure 
— J  8  J 

has  been  considered  by  several  authors  Including  Hartley  and  Rao  [14], 
John  [17],  Scott  and  Symons  [31],  and  Sclove  [30].  Unfortunately,  with 
this  procedure,  the  Yj  increase  in  number  with  the  number  of  observa¬ 
tions,  and  under  such  conditions  the  maximum  likelihood  estimates  need 
not  be  consistent.  Marriott  [23]  pointed  out  that  under  the  standard 
assumption  of  normal  distributions  with  common  variance  matrices,  this 
procedure  gives  definitely  inconsistent  estimates  for  the  parameters 
involved.  More  recently,  Bryant  and  Williamson  [4]  extended  Marriott's 
results  and  showed  that  the  method  may  be  expected  to  give  asymptotically 
biased  results  quite  generally. 

A  related  approach  is  the  mixture  maximum  likelihood  method 
considered  by  Day  [5],  and  Wolfe  [34],  among  many  others.  With  this 
approach  x^,...,xn  are  assumed  to  be  a  random  sample  of  size  n 
from  a  mixture  of  11^,..., II ^  in  the  proportions  (e^,...,e^)  -  e'. 

Hence  the  likelihood 


u  IV 

Lj1(x1,...,xn;9,e)  -  H  {  £  ei  f^x^jQ)} 


(1.2) 


can  be  formed;  the  estimates  of  6  and  e  obtained  by  maximizing 

•>*  ««» 

A  A 

(1.2)  are  denoted  by  6  and  e  respectively.  Each  x^  can  be 
classified  then  on  the  basis  of  the  estimated  posterior  probabilities 


P,_  (i-l . k)  formed  by  replacing  9  and  e  with  9  and  e  in 

ij  —  ~  ** 


It  can  be  seen  that  the  mixture  approach  is  equivalent  to  the 
classification  procedure  with  the  additional  assumption  that 
Y^> . . . ,y  is  an  (unobservable)  random  sample  from  a  probability 
distribution  with  mass  e^,  at  i  (i»l,...,k).  It  appears  to  avoid 
the  asymptotic  biases  associated  with  the  classification  procedure 
where  at  each  step  in  the  iterative  process  of  computing  the  maximum 
likelihood  estimates  each  x  is  assigned  outright  to  a  particular  sub- 
population  according  to  the  estimate  for  y j •  By  contrast,  the  mixture 
approach  does  not  insist  on  definite  membership  to  any  subpopulation; 
rather  it  gives  an  estimated  probability  of  membership  of  each  subpopulation. 

Note  that  another  approach  to  this  problem  is  to  proceed  further  and 
adopt  a  Bayesian  procedure  in  which  all  parameters  are  random  variables 
(Binder  [2],  Symons  [32]). 

A  common  assumption  in  practice  is  to  adopt  the  normality  model 


x.  -  N(y  ,E)  in  H.  (i-l,...,k)  . 

~J  -1  ~  1 


(1.3) 


In  this  case  8  has  ~p(p+2k+l)  elements,  comprising  the  components 


of  the  k  mean  vectors  y^  and  the  distinct  elements  of  the  common 
covariance  matrix  Z,  and  the  density  f^(x;0)  is  given  by 

fOcjy^Z)  -  (2ir)~1/2p  |lj~1/2{exp  (x-y±)  ’  l"1(x-yi)}  . 

We  now  proceed  to  consider  the  application  of  the  classification  and 
mixture  approaches  under  the  normality  model  (1.3)  which  is  assumed 
to  hold  through  to  Section  5,  where  the  condition  of  a  common  covari¬ 
ance  matrix  is  relaxed  to  cover  the  general  case  of  unequal  covariance 
matrices. 


2.  CLASSIFICATION  APPROACH 


In  principle  the  maximization  process  for  the  classification  maximum 
likelihood  procedure  can  be  carried  out  since  it  is  just  a  matter  of 
computing  the  maximum  value  of  the  likelihood  (1.1)  over  all  possible 
partitions  of  the  n  observations  to  the  k  subpopulations.  However, 
unless  n  is  quite  small,  searching  over  all  possible  partitions  is 
prohibitive.  It  follows  that  Yj  *  8  if 


f (x. ;y  ,Z)  ^  f(x.;y.,Z),  (i*l,...,k) 

~J  ~g  ~  —  ~i  - 


(2.1) 


where  y^  and  Z  are  the  ordinary  maximum  likelihood  estimates  of 
y^  and  Z  for  a  sample  of  normal  observations  classified  according 
to  y-  Hence  the  solution  can  be  computed  iteratively  (John  [17], 
Sclove  [30]).  Starting  with  some  initial  clustering  Y,  the  y^ 


4 


and  E  are  estimated  accordingly  and  then  used  to  give  a  new  estimate 

of  y  on  the  basis  (2.1),  equivalent  to  allocating  each  observation  to 

the  nearest  cluster  centre  in  terms  of  the  estimated  Mahalanobis  distance. 

Each  step  in  the  iterative  process  yields  a  value  of  the  likelihood  not 

less  than  that  at  the  previous  step,  and  the  iterations  may  be  continued 

until  no  observation  changes  clusters.  Various  starting  values  should  be 

taken  in  an  attempt  to  locate  the  global  solution.  It  will  be  seen  in 

the  next  section  that  the  likelihood  equations  under  the  mixture  approach 

can  be  easily  modified  to  be  applicable  also  under  the  classification 

approach.  There  are  other  procedures  for  finding  the  solution  under  the 

classification  approach;  for  example,  the  Mahalanobis  distance  version 

of  MacQueen*s  [20]  k-means  procedure,  where  the  y.  and  £  are  re- 

■•I 

estimated  after  each  observation  is  allocated  rather  than  waiting  until 
after  all  the  observations  have  been  allocated. 

For  the  classification  approach  applied  under  the  normality  model 
(1.3),  Scott  and  Symons  [31]  showed  that  y  corresponds  to  the  partition 
which  minimizes  the  determinant  of  the  pooled  within-subpopulations  sum 
of  squares  matrix 


W  -  I  W 
i-1 

where 


Si"  \  ‘Si, -Sign'll’’ 


and  x^  (q*l, ... ,n^)  denote  the  n^  observations  assinged  to  1^ 


according  to  Y  and  x^  refers  to  their  sample  mean;  see  also 
Friedman  and  Rubin  [9]  who  originally  suggested  this  criterion. 

The  minimization  of  |w|  would  appear  to  be  a  reasonable  clustering 
criterion  regardless  of  the  underlying  distributions.  Marriott 
[22]  has  given  a  comprehensive  account  of  the  properties  of  this 
criterion.  It  does  have  the  tendency  to  produce  clusters  of  roughly 
equal  size,  although  the  modified  version, 

k 

n  log|w|  -  2  J  n  log  n 
1=1  1 

suggested  recently  by  Symons  [32],  would  appear  to  go  some  way  to 
overcoming  this. 


3 .  MIXTURE  APPROACH 

An  excellent  account  of  the  computation  of  the  maximum  likelihood 
estimates  of  V^.I,  and  e  for  the  mixture  approach  has  been  given  by 
Day  [5].  Under  the  normality  model  (1.3),  the  posterior  probabilities 
Py  (i*l, . . .  ,k;j=l . n)  have  the  form 

k 

Py  -  exp(a^Xj+  b^/t  ^exp(a^  x^+b^} 


where 


a 

~r 


f  ^Sr  -  h> 
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and 


r^Ki-y  +  lD*(er/ei) 


for  r  =  l,...,k;  that  is,  =  0  and  -  0.  The  maximum  likeli¬ 
hood  estimates  are  evaluated  from  the  equations 


/v  n  ✓s 

ei  -  I  (3.1) 

j-l 


A  n  A  A 

=  I  (P..x  )/(n  e  )  (3.2) 

~  J-l  1 


and 

Z  -  l  l  (P.,/n)(x  -U.)(x  -y  )•  ,  (3.3) 

~  i=l  J-l  ^  -1 

which  can  be  solved  iteratively  by  substituting  some  initial  values 
for  the  estimates  into  the  right-hand  side  of  (3.1)  to  (3.3)  to 
produce  new  estimates  on  the  left-hand  side,  which  are  then  substi¬ 
tuted  into  the  right-hand  side,  and  so  on.  These  iterative  estimates 
can  be  identified  with  those  obtained  by  directly  applying  the  so- 
called  EM  algorithm  of  Dempster  et  al.  [6],  which  shows  that  the 
estimates  will  converge  to  a  local  maximum  irrespective  of  the 
starting  point.  The  iterative  process  should  be  started  from  several 
points  in  an  attempt  to  ensure  that  the  global  maximum  is  obtained. 
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t 

a 


'% . 


Day  [5]  has  shown  that  considerable  computing  time  can  be  saved 
for  k  -  2  by  reparametrizing  the  likelihood  in  terms  of  a,  b,  m, 
and  V,  where 


m 


£lHl  +  e2»2 


and 


V  =  E  +  eie2  ^i  “ ~  ' 


and  the  mean  and  covariance  matrix  of  the  mixture  distribution;  a 
and  b  denote  a 2  and  b2  with  their  subscripts  suppressed  since 
k  *  2  only.  The  maximum  likelihood  equations  now  can  be  written  as 


m  ■  1  Vn  > 

-  j-1  ~J 


v  “  l  (x .  -  m)  (x  -  m)  ’  /n  , 
~  j-1  -3  ~  ~ 


(3.4) 


(3.5) 


A  A  i  A  A  A  A  A  A  A^-J  A  A 

a  *  Y”  <H2‘Hl>/{1**le2<Hl-H2>,I  (iJi  “  ]f2^ 


and 


b  -  -  J  +  +  lo8^E2^ei^  • 


(3.6) 


(3.7) 


A  A 

Only  values  of  a  and  b  are  needed  in  solving  the  above  equations 

A  A 

as  m  and  V  are  given  explicitly. 
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To  obtain  suitable  initial  values  of  a  and  b,  it  is  suggested 
for  various  bivariate  subsets  of  the  variables  plotting  the  data  points 
and  drawing  a  line  which  divides  the  data  into  two  groups  which  have  a 
scatter  that  appears  normal  (see,  for  example,  O'Neill  [28]  and 
Ganesalingam  and  McLachlan  [12]).  Estimates  of  a  and  b  can  be 
formed  on  the  basis  of  this  subdivision,  proceeding  as  if  the  observa¬ 
tions  were  correctly  classified.  There  appears  to  be  no  difficulty  in 
locating  the  global  maximum  for  p  =  1  and  2,  but  for  p  >_  3  there 
are  problems  with  multiple  maxima,  particularly  for  small  values  (less 
than  two,  say)  of  the  Mahalanobis  distance  between  11^  and  » 


A  =  { (y  -  y  )  ' 

~2 


1/2 
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when  n  is  not  large  (Day  [5]).  Also,  it  is  well-known  (Day  [5]  and 
Hosmer  [16])  that  maximum  likelihood  estimates  based  on  a  mixture  of 
normal  distributions  are  very  poor  unless  n  is  very  large  (for 
example,  n  500).  However,  Ganesalingam  and  McLachlan  [11]  found 

A  A 

that  although  the  maximum  likelihood  estimates  a  and  b  may  not 
be  very  reliable  for  small  n,  it  appears  that  the  proportions  in 

A  A 

which  the  components  of  a  and  b  occur  are  such  that  the  resulting 

A.  A 

discriminant  function,  a'x  +  b,  may  still  provide  reasonable  separation 
between  the  subpopulations. 

Note  that  the  same  set  of  equations  here  can  be  used  as  follows 
to  compute  the  estimates  E,  and  y  under  the  classification  approach. 
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At  a  given  step  y  is  put  equal  to  that  g  for  which  P  >  P 
J  gJ  -  lj 

(i=l,...,k)  where,  in  the  P  ,  is  used  without  the 

log(er/e^)  term.  Then  on  the  next  step  the  and  Z  are  computed 

A 

from  (3.1)  to  (3.3)  in  which,  for  each  j,  P  is  replaced  by  1 
(i=g)  and  0  (i^g) .  The  transformed  equations  (3.4)  to  (3.7)  for 
k=2  are  also  applicable  to  the  classif ication  approach  with  the  above 

A 

modifications;  that  is,  the  term  corresponding  to  in  (3.6)  is  given 

A  /A 

by  n^/n  (i=l,2)  while  there  is  no  term  corresponding  to  log^^/e^)  in 
(3.7). 

A  simulation  study  undertaken  by  Ganesalingam  and  McLachlan  [13] 
for  k=2  suggests  that  overall  the  mixture  approach  perforins  quite 
favourably  relative  to  the  classification  approach  even  where  mixture 
sampling  does  not  apply.  The  apparent  slight  superiority  of  the  latter 
approach  for  samples  with  subpopulations  represented  in  approximately 
equal  numbers  is  more  than  offset  by  its  inferior  performance  for 
disparate  representations. 

4.  EFFICIENCY  OF  THE  MIXTURE  APPROACH 

We  consider  now  the  efficiency  of  the  mixture  approach  for  k=2 
normal  subpopulations,  contrasting  the  asympotic  theory  with  small 
sample  results  available  from  simulation. 

For  a  mixture  of  two  univariate  normal  distributions  Ganesalingam 
and  McLachlan  [10]  studied  the  asymptotic  efficiency  of  the  mixture 
approach  relative  to  the  classical  discrimination  procedure  (appropriate 
for  known  y)  by  considering  the  ratio 
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e  -  {E(R)  -  Rq}/{E(Rm)  -  Rq} 


* 


(4.1) 


where  E(R^)  and  E(R)  denote  the  unconditional  error  rate  of  the 
mixture  and  classical  procedures  respectively  applied  to  an  unclassi¬ 
fied  observation  subsequent  to  the  inital  sample,  and  Rq  denotes 
their  common  limiting  value  as  n  -*•  ».  The  asymptotic  relative 
efficiency  was  obtained  by  evaluating  the  numerator  and  denominator 
of  (4.1)  up  to  and  including  terms  of  order  1/n.  The  multivariate 
analogue  of  this  problem  was  considered  independently  by  O’Neill 
[28].  By  definition  the  asymptotic  relative  efficiency  does  not 
depend  on  n,  and  O'Neill  [28]  showed  that  it  also  does  not  depend 
on  p  for  equal  prior  probabilities,  ■  0.5.  The  asymptotic 

values  of  e  are  displayed  in  Table  1  as  percentages  for  selected 
2 

combinations  of  A  ,  p,  and  n;  the  corresponding  values  of  e 
obtained  from  simulation  are  extracted  from  Ganesalingam  and  McLachlan 
[11]  and  listed  below  in  parentheses.  It  can  be  seen  that  the  asymptotic 
relative  efficiency  does  not  give  a  reliable  guide  as  to  the  true 
relative  efficiency  when  n  is  small,  particularly  for  A  «■  1.  This 
is  not  surprising  since  the  asymptotic  theory  of  maximum  likelihood 
for  this  problem  requires  n  to  be  very  large  before  it  applies  (Day 
[5],  Hosmer  [16]).  Further  simulation  studies  by  Ganesalingam  and 
McLachlan  [11]  in  the  univariate  case  indicate  that  the  asymptotic 
relative  efficiency  gives  reliable  predictions  at  least  for  n  >  100 
and  A  >  2. 
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The  simulated  values  for  the  relative  efficiency  in  Table  1 
suggest  that  for  the  mixture  approach  to  perform  comparably  with  the 
classical  discrimination  procedure  it  needs  to  be  based  on  about  two 
to  five  times  the  number  of  initial  observations,  depending  on  the 
combination  of  the  parameters. 

5.  UNEQUAL  COVARIANCE  MATRICES 

For  normal  subpopulations  II  with  unequal  covariance  matrices 
the  classification  procedure  has  to  be  applied  with  the  restric¬ 
tion  that  at  least  p  +  1  observations  belong  to  each  subpopulation 
to  avoid  the  degenerate  case  of  infinite  likelihood. 

The  likelihood  equations  under  the  mixture  approach  are  given  by 
(3.1)  to  (3.3)  appropriately  modified  to  allow  for  k  different  co- 
variance  matrices  (Wolfe  [34]).  Unfortunately,  maximum  likelihood 
estimation  breaks  down  in  practice  for  each  data  point  gives  rise  to 
a  singularity  in  the  likelihood  on  the  edge  of  the  parameter  space. 

This  problem  has  received  a  good  deal  of  attention  recently.  For  a 
mixture  of  two  univariate  normal  distributions,  Kiefer  [18]  has  shown 

A 

that  the  likelihood  equations  have  a  root  $  which  is  a  consistent, 

asymptotically  normal  and  efficient  estimator  of  <p  *  (8',e')'.  Quandt 

and  Ramsey  [29]  proposed  the  moment  generating  function  (MGF)  estimator 

obtained  by  minimizing 

h  n  t  x  2 

l  ty(t  )  -  l  e  1  j/n) 
i»l  1  j-1 
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for  selected  values  t^,...,t^  of  t  in  some  small  interval  (c,d), 
c  <  0  <  d,  where 


V>(t)  -  I  e.  exp(v  t+  ~  a2 t2) 
i-1  1  z  1 

is  the  MGF  of  a  mixture  of  two  normal  distributions  with  variances 
2  2 

a,  and  •  The  usefulness  of  the  MGF  method  would  appear  to  be 
that  it  provides  a  consistent  estimate  which  can  be  used  as  a  starting 
value  when  applying  the  EM  algorithm  in  an  attempt  to  locate  the  root 
of  the  likelihood  equations  corresponding  to  the  consistent,  asympto¬ 
tically  efficient  estimator.  Bryant  [3]  suggests  taking  the  classifi¬ 
cation  maximum  likelihood  estimate  of  4>  as  a  starting  value  in  the 
likelihood  equations. 

The  robustness  of  the  mixture  approach  based  on  normality  as  a 
clustering  procedure  requires  investigation.  A  recent  case  study  by 
Hernandez -A1 vi  [15]  suggests  that,  at  least  in  the  case  where  the 
variables  are  in  the  form  of  proportions,  the  mixture  approach  may  be 
reasonably  robust  from  a  clustering  point  of  view  of  separating  samples 
in  the  presence  of  multimodality. 

6.  UNKNOWN  NUMBER  OF  SUBPOPULATIONS 

Frequently  with  the  application  of  clustering  techniques  there  is 
the  difficult  problem  of  deciding  how  many  subpopulations,  k,  there 
are.  A  review  of  this  problem  has  been  given  by  Everitt  [8];  see  also 
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Engelman  and  Hartigan  [7]  and  Lee  {19].  With  respect  to  the  classifica¬ 
tion  approach  Marriott  [21]  has  suggested  taking  k  to  be  the  number 
which  minimizes  k  |wj .  For  heterogeneous  covariance  matrices  there 
may  be  some  excessive  subdivision,  but  this  can  be  rectified  by  recombin¬ 
ing  any  two  clusters  which  by  themselves  do  not  suggest  separation  was 
necessary. 

With  the  mixture  approach  the  likelihood  ratio  test  is  an  obvious 
criterion  for  choosing  the  number  of  subpopulations.  However,  for 
testing  the  hypothesis  of,  say,  k^  versus  k^  subpopulations 
(kf  <  k2),  it  has  been  noted  (Wolfe  [35])  that  some  of  the  regularity 
conditions  are  not  satisfied  for  minus  twice  the  log  -  likelihood  ratio 
to  have  under  the  null  hypothesis  an  approximate  chi-square  distribution 
with  degrees  of  freedom  equal  to  the  difference  in  the  number  of  parameters 
in  the  two  hypotheses.  Wolfe  [35]  suggested  using  a  chi-square  distribution 
with  twice  the  difference  in  the  number  of  parameters  (not  including  the 
proportions),  which  appears  to  be  a  reasonable  approximation  (Hernandez-Alvi 
[15]). 


7.  PARTIAL  CLASSIFICATION  OF  SAMPLE 


We  now  consider  the  situation  where  the  classification  of  some  of 
the  observations  in  the  sample  is  initially  known.  This  information  can 
be  easily  incorporated  into  the  maximum  likelihood  procedures  for  the 
classification  and  mixture  approaches.  If  an  is  known  to  come  from, 

say  nr,  then  under  the  former  approach  Yj  "  r  always  in  the  associated 
iterative  process  while,  under  the  latter,  P^  is  set  equal  to  l(i“  r) 
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and  0(ij*r)  in  all  the  iterations.  In  those  situations  where  there 
are  sufficient  data  of  known  classification  to  form  a  reliable  discri¬ 
mination  rule,  the  unclassified  data  can  be  clustered  simply  according 
to  this  rule  and,  for  the  classification  approach,  the  results  of 
McLachlan  [24,25]  suggest  this  may  be  preferable  unless  the  unclassified 
data  are  in  approximately  the  same  proportion  from  each  subpopulation. 

With  the  mixture  approach  a  more  efficient  clustering  of  the  unclassified 
observations  should  be  obtained  by  simultaneously  using  them  in  the 
estimation  of  the  subpopulation  parameters,  at  least  as  n  -*■  »,  since 
the  procedure  is  asymptotically  efficient.  The  question  of  whether  it 
is  a  worthwhile  exercise  to  update  a  discrimination  rule  on  the  basis  of 
a  limited  number  of  unclassified  observations  has  been  considered  recently 
by  McLachlan  and  Ganesalingam  [26].  For  other  work  on  the  updating  problem 
the  reader  is  referred  to  Titterington  [33],  Murray  and  Titterington  [27], 
and  Anderson  [1], 
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TABLE  1 


Asymptotic  Versus  Simulation  Results  for  the 
Relative  Efficiency  of  the  Mixture  Approach 


p=l 

o 

<N 

II 

G 

P=2 

,  n=20 

P=3 

,  n=40 

A 

Ej  =  0.25 

E  j  =0.50 

Ej  =  0.25 

Ej  =  0.50 

E  j  =  0.25 

E ,  »  0 . 50 

1 

0.25 

0.51 

0.34 

0.51 

0.42 

0.51 

(33.01) 

(25.12) 

(46.71) 

(63.11) 

(25.00) 

(43.39) 

2 

7.29 

10.08 

9.36 

10.08 

10.51 

10.08 

(22.05) 

(17.74) 

(25.73) 

(16.26) 

(16.28) 

(14.51) 

3 

31.41 

35.92 

35.13 

35.92 

36.78 

35.92 

(19.57) 

(23.54) 

(43.91) 

(29.63) 

(29.01) 

(23.46) 
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